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Research Accomplishments DTIC QUALITY INSPECTED 3 


In this work we have pursued the development of parallel algorithms for matrix compu- 
tations. We highlight a few of these activities below, and include a complete list of the 
publications resulting from this research grant. 


Bounds on Scaled Projectors. Scaled projection operators arise in computations 
for weighted least squares problems, linear programming algorithms, and other appli- 
cations. Let X be a matrix of full column rank and let D be a diagonal matrix with 
positive diagonal elements. We have shown that the weighted pseudo-inverse defined 
by x} = (XTDX)-1XTD and the associated oblique projection Pp = xx} r, have 
norms bounded by numbers that are independent of D. 

Iteratively Reweighted Least Squares. We studied various algorithms to obtain 
estimates of solution vectors and residual vectors for the linear model Ar = b+€e = Dirue 
using an iteratively reweighted least squares criterion, which tends to diminish the in- 
fluence of outliers compared with the standard least squares criterion. Algorithms ap- 
propriate for dense and sparse matrices were developed. Solving Newton’s linear system 
using updated matrix factorizations or the (unpreconditioned) conjugate gradient iter- 
ation gave the most effective algorithms. Four weighting functions were compared, and 
results were obtained for sparse well-conditioned and ill-conditioned problems. 

Subspace updating. An important problein in array signal processing is to de- 
termine the null space of a matrix of signals sampled at discrete times by an array of 
m sensors. It is necessary that the subspace be updated in real time. The customary 
approach has been through the singular value decomposition; however, this decom- 
position cannot be updated exactly in fewer than O(m?) operations, and all parallel 
algorithms proposed for approximate updating require O(m?) processors. Recently we 
have introduced a new decomposition—the URV decomposition — which can be up- 
dated sequentially in O(m?) operations, and in parallel in O(m) operations using O(m) 
processors. This decomposition has applications in other areas of signal processing. We 
are investigating a sequential implementation of the method. A parallel implementation 
has been debugged on a simulator, we are preparing to move it to the WARP. 


Parallel] QR factorization. A project on parallel QR factorization has been com- 
pleted. A parallel Gram-Schmidt algorithm and a parallel Householder algorithm have 


been developed and programmed, and analytical models for the time complexity of 
these algorithms have been developed. The models were validated over a wide range of 
parameter values for floating point and communication speed through experiments on 
the ZMOB, MCMOB, a 16 processor Butterfly with hardware floating point, and a 128 
processor Butterfly with software floating point. 


Interprocessor communication. It has long been known that there is a close 
relationship between granularity of communication and granularity of computation. 
Roughly speaking the coarser the granularity of communication, the coarser the gran- 
ularity of computation must be to compensate. In a paper to appear in Parallel Com- 
puting, we investigate this phenomenon theoretically and empirically. The conclusion is 
that to solve the large problems that tomorrow’s generation of parallel computers can 
hold, we must have fine grained communication. 


Polynomial preconditioners for conjugate gradient algorithms. Precondi- 
tioning to produce a more favorable distribution of eigenvalues is essential in using the 
conjugate gradient algorithm to solve linear systems of equations. The choice of pre- 
conditioner must be well matched to the problem and to the computer architecture. 
Polynomial preconditioning is a useful tool in the effective use of the conjugate gra- 
dient algorithm on special architectures such as message passing parallel computers, 
machines with hierarchical memory, vector processors, and machines with very limited 
memory. We have developed a new adaptive algorithm that uses a polynomial] based on 
the residual polynomial from k steps of the conjugate gradient algorithm. This precon- 
ditioning requires no prior information about the matrix and is efficient on a variety of 
architectures. 


Eigenvalues of Arrowhead Matrices. A query from a physicist led us to consider 
the problem of finding the eigenvalues and eigenvectors of a symmetric matrix with 
nonzeroes only in the main diagonal and the last row and column. A highly parallel 
algorithm was developed and an error analysis was completed. 

Projection methods for eigenvalue problems. Projection methods, such as 
Kaczmarz’s algorithm are promising for very large eigenvalue problems, since they use 
comparatively little storage and access the matrix only one row at a time. Kaczmarz’s 
method for inhomogeneous systems has been extensively analyzed. In spite of this, 
very little is known about the rate of convergence of the method. Using a relation 
between Kaczmarz’s method and SOR, we conjecture that the convergence rate should 
be approximately 


Lr a 
1 + On-1 : 


where oy- 1 is the second smallest singular value of A— AJ and 4 is the eigenvalue whose 
eigenvector is to be found. 
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Markov chains. Although the theory of Gaussian elimination for the solution 
of Markov chains insures that it will perform well on chains with balanced transition 
probabilities, the method fails for nearly uncoupled Markov chains. We have given a 
formal rounding-error unalysis of a variant of Gaussian elimination which works for 
these chains. 

Two students completed Ph.D. theses in the study of Markov chains. Xiaobai Sun 
presented a unified framework for studying various aggregation algorithms and used it to 
study the convergence rate of these algorithms. Pil Park studied iterative algorithms for 
overflow queuing networks, and achieved good results with a combination of projection 
and preconditioning. 


Perturbation theory. Six problems in matrix perturbation theory have emerged 
from our work. They concern the condition of nearly uncoupled Markov chains, the 
computation of residual bounds for eigenvalues, the computation of residual bounds 
for singular values, the condition of multiple eigenvalues, the perturbations of nearly 
transient Markov chains, and the perturbation of matrix factorizations. Satisfactory 
solutions of these problems have been found. 


Constrained matrix Sylvester equations. The matrix Sylvester problem of 
finding a matrix T to satisfy AT + TF = C arises in applications in control problems 
involving the solution of ordinary differential equations. In the design of reduced order 
observers for loop transfer recovery, the solution T is further constrained to satisfy 
TB = 0, but the matrix C is the product of two matrices, on: to be determined. 
Questions of existence and uniqueness of solutions to such problems have been studied, 
and a computational algorithm has been developed and tested, and applied to design 
of reduced order observers that achieve loop transfer recovery in aircraft. 


Publication, etc. 


Technical reports 


1. G. W. Stewart, “An Iterative Method for Solving Linear Inequalities,” CS-TR- 
1833, University of Maryland, April, 1987 
This paper describes a method for solving homogeneous linear inequalities. The 
numerical techniques required by the algorithm can be parallelized and are im- 
portant in a number of other applications. 


2. Gene H. Golub and Dianne P. O’Leary, “Some History of the Conjugate Gradient 
and Lanczos Algorithms: 1948-1976,” CS-TR-1859 and UMIACS-TR-87-20, Uni- 
versity of Maryland, June, 1987. 


This manuscript gives some of the history of the conjugate gradient and Lanczos 
algorithms and an annotated bibliography for the period 1948-1976. 
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3. Robert A. van de Geijn, “Implementing the QR-Algorithm on an Array of Pro- 
cessors,” CS-TR-1897, University of Maryland, August, 1987. 


QR-algorithms for solving the algebraic eigenvalue problem that initially reduce 
the matrix to upper Hessenberg form and utilize traditional shifting strategies do 
not lend themselves to efficient implementation on a grid of procesoors. In this 
thesis, we introduce a variation of the QR-algorithm that works with the full ma- 
trix and show how it can be implemented on a square array of processors. By 
using a deferred shifting scheme, iterations can be pipelined, thereby reducing 
processor ‘dle time. A thorough analysis of deferred shifting techniques show that 
the asymptotic convergence rate remains acceptable. 


4. Zhaojun Bai, “The Direct GSVD Algorithm and Its Parallel Implementation,” 
CS-TR-1901, University of Maryland, August, 1987. 


The generalized singular value decomposition (GSVD) is the simultaneous reduc- 
tion of any two matrices having the same number of columns to diagonal matrices 
by premultiplying by two different orthogonal matrices and postmultiplying by 
the same nonsingular matrix. Following the work of C. C. Paige on the sequential 
Jacobi-like GSVD algorithm, we first provide a clearer description of his algorithm 
and a more straightforward proof. A new version of the direct GSVD algorithm, 
called the direct GSVD algorithm, is given. The error analysis shows that the 
algorithm is stable in the presence of rounding errors. 


A parallel implementation of the direct GSVD algorithm is proposed in the second 
part of the paper. The parallel algorithm falls into two parts. The first is that the 
input matrices are preprocessed in parallel by computing their upper trapezoidal 
forms, for which we develop a parallel QR decomposition with column pivoting. 
The second is the parallel computations of the GSVD of two upper trapezoidal 
matrices. 


Finally, the direct GSVD algorithm is used to derive an efficient method for the 
problem of equality-constrained least squares, and show how effective the GSVD is 
in dealing with a linearly constrained Gauss-Markov model. The GSVD provides 
an efficient algorithm and reveals the structure of the model more clearly than the 
usual general inverse expressions. 


5. Zhaojun Bai “Data-flow Algorithm for Computing the Singular Value Decompo- 
sition,” CS-TR-1941 University of Maryland, August, 1987. 


In this report, a data-flow algorithm for computing the singular value decompo- 
sition (SVD) of any matrix is developed. It contains the description of a prepro- 
cession QRD algorithm for QR decomposition and SVD algorithm for the SVD of 
a triangular matrix. Asymptotic convergence rate in parallel odd-even ordering 
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is examined. Moreover, the computational network, assignment of the computa- 
tional nodes, and data communication complexity are also presented. 


6. Zhaojun Bai, “Numerical Treatment of Restricted Gauss-Markov Model,” CS-TR- 
1942 University of Maryland, August, 1987. 


The singular value decomposition has been widely used in the ordinary linear 
model and other statistical problems. In this paper, we shall introduce the gen- 
eralized singular value decomposition of any two matrices X and 1] having the 
same number of columns to treat large scale restricted Gauss-Markov models 
(y, Xbeta = r,sigma7I). This method leads to an efficient, numerically stable 
and easily programmed algorithm for the best linear unbiased estimator in Jarge 
scale restricted Gauss-Markov linear models. The added information and sensi- 
tivity that it provides may be very useful in understanding the problem. 


7. G. W. Stewart, “The Crab: a Dialogue,” University of Maryland CS-TR-2023, 
May 1988. 


The CRAB is a module for building memory connected parallel systems in which 
processors that are connected to one another can read and write each other’s 
memory. This arrangement solves some of the communication problems associated 
with more conventional message passing systems. In particular, it is possible for 
a line of processors to pipeline data, altering the data items as they pass by. This 
report contains a dialogue about the CRAB, describing what it is and what it can 
do. 


8. G. W. Stewart, “On Scaled Projections and Pseudo-Inverses,” CS-TR 2026, May, 
1988. 


Let X be a matrix of full column rank and let D be a diagonal matrix with positive 
diagonal elements. The weighted pseudo-inverse defined by x} =(XTDX)71XTD 


and the associated oblique projection Pp = xx}, arise in many applications. In 
this paper, we show that the norms of both matrices are bounded by numbers 
that are independent of D. 


9. Dianne Ρ, O’Leary and Peter Whitman, “Parallel QR Factorization by House- 
holder and Modified Gram-Schmidt Algorithms,“ CS-TR 2119, UMIACS-TR-88- 
78, October, 1988. 

In this paper, the parallel implementation of two algorithms for forming the QR 
factorization of a matrix is studied. We propose parallel algorithms for the mod- 
ified Gram-Schmidt and the Householder algorithms on message passing systems 
in which the matrix is distributed by blocks of rows. The models that predict 
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performance of the algorithms are validated by experimental results on several 
parallel machines. 


10. G. W. Stewart, “Stochastic Perturbation Theory,” CS-TR 2129, UMIACS-TR-88- 
76, October, 1988. 


In this paper, classical matrix perturbation theory is approached from a proba- 
bilistic point of view. The perturbed quantity is approximated by a first order 
perturbation expansion, in which the perturbation is assumed to be random. This 
permits the computation of statistics estimating the variation in the perturbed 
quantity. Up to the higher order terms that are ignored in the expansion, these 
statistics tend to be more realistic than perturbation bounds obtained in terms of 
norms. The technique is applied to a number of problems in matrix perturbation 
theory, including least squares and the eigenvalue problem. 


11. G. W. Stewart, “Communication and Matrix Computations on Large Message 
Passing Systems,” CS-TR-2135, University of Maryland, October, 1988. 


This paper is concerned with the consequences for matrix computations of having 
a rather large number of general purpose processors, say ten or twenty thousand, 
connected in a network in such a way that a processor can communicate only 
with its immediate neighbors. Certain communication tasks associated with most 
matrix algorithms are defined and formulas developed for the time required to 
perform them under several communication regimes. The results are compared 
with the times for a nominal πη floating point operations. The results suggest 
that it is possible to use a large number of processors to solve matrix problems at 
a relatively fine granularity. 


12. D. P. O’Leary and G. W. Stewart, “Computing the Eigenvalues and Eigenvectors 
of Arrowhead Matrices,” CS-TR-2203, University of Maryland, February 1989. 


This paper treats the eigenvalue problem for a symmetric matrix which is zero 
except for its main diagonal and one row and column. Such problems arise in 
the description of radiationless transitions in isolated molecules and of oscillators 
vibrationally coupled with a Fermi liquid. In these applications the order n of 
the matrix A can be in the thousands. The purpose of this paper is to present 
formulas and efficient and highly parallel algorithms for computing eigenvalues 
and eigenvectors of such matrices. 


13. Dianne P. O’Leary, “On Bounds for Scaled Projections and Pseudo-Inverses,” 
UMIACS-TR-89-32 CS-TR-2215, University of Maryland February, 1989. 


Let X be a matrix of full column rank and let D be a positive definite diagonal 
matrix. In a recent paper, Stewart considered the weighted pseudo-inverse ap = 
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14. 


15. 


16. 


17. 
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(X7TDX)-1XTD and the associated oblique projection Pp = ΧΑ, and gave 
bounds, independent of D, for the norms of these matrices. In this note, we 
answer a question he raised by showing that the bounds are computable. 


Dianne P. O’Leary, “On Iteratively Reweighted Linear Least Squares Problems,” 
CS-TR-2278, UMIACS 89-66, University of Maryland, July, 1989. 


Several variants of Newton’s method are used to obtain estimates of solution vec- 
tors and residual vectors for the linear model Az = b+e = biyue using an iteratively 
reweighted lease squares criterion, which tends to diminish the influence of outliers 
compared with the standard least squares criterion. Algorithms appropriate for 
dense and sparse matrices are presented. Solving Newton’s linear system using 
updated matrix factorizations or the (unpreconditioned) conjugate gradient iter- 
ation gives the most effective algorithms. Four weighting functions are compared, 
and results are given for sparse well-conditioned and ill-conditioned problems. 


G. W. Stewart, “Perturbation Theory and Least Squares with Errors in the Vari- 
ables,” UMIACS-TR-89-97, CS-TR 2326, University of Maryland, October, 1989. 


In this note we examine what matrix perturbation theory has to say about or- 
dinary least squares estimation when the regression matrix is contaminated by 
random errors. The conclusion is that there is a regime in which the errors can 
have important, even overwhelming effects yet do not affect the validity of or- 
dinary least squares procedures. The boundary of this regime is indicated by a 
diagnostic number which can be calculated from the data. 


G. W. Stewart, “Two Simple Residual Bounds for Eigenvalues of Hermitian Ma- 
trices,” UMIACS-TR-89-123, CS-TR 2364, December, 1989. 

Let A be Hermitian and let the orthonormal columns of X span an approximate 
invariant subspace of X. Then the residual R = AX - XM (Δ = X#X) will be 
small. The theorems of this paper bound the distance of the spectrum of Af from 
the spectrum of A in terms of appropriate norms of R. 


G. W. Stewart, “On the Sensitivity of Nearly Uncoupled Markov Chains,” UMIACS- 
TR-90-18, CS-TR 2406, February, 1990. 


Nearly uncoupled Markov chains (aka nearly completely decomposable Markov 
chains) arise in a variety of applications, where they model loosely coupled sys- 
tems. In such systems it may be difficult to determine the transitions probabilities 
with high accuracy. This paper investigates the sensitivity of the limiting distribu- 
tion of the chain to perturbations in the transition probabilities. The conclusion 
is that nearly uncoupled Markov chains are quite sensitive to such perturbations 
but the perturbation of the limiting distribution is not arbitrary. 


“I 


18. G. W. Stewart and G. Zhang, “Eigenvalues of Graded Matrices and the Condition 

Numbers of a Multiple Eigenvalue,” UMIACS TR 90-31, CS TR 2420, February 
1990. 
This paper concerns two closely related topics: the behavior of the cigenvalues 
of graded matrices and the perturbation of a nondefective multiple eigenvalue. 
We will show that the cigenvalues of a graded matrix tend to share the graded 
structure of the matrix and give precise conditions insuring that this tendency is 
realized. These results are then applied to show that the secants of the canonical 
angles between the left and right invariant subspaces of a multiple eigenvalue tend 
to characterize its behavior when its matrix is slightly perturbed. 


19. Dianne P. O’Leary, “Yet Another Polynomial Preconditioner for the Conjugate 
Gradient Algorithm,” CS-TR-2425, UMIACS 90-36, University of Maryland, March, 


1990. 
Polynomial preconditioning is a useful tool in the effective use of the conjugate 
gradient algorithm on special architectures such as mess- sing parallel com- 


puters, machines with hierarchical memory, vector pr ὁ 18, and machines with 
very limited memory. In this work we investigate the use of a new adaptive algo- 
rithm which uses the polynomial preconditioner based on the residual polynomial 
from k steps of the conjugate gradient algorithm. 


20. G. W. Stewart, “An Updating Algorithm for Subspace Tracking,” UMIACS-TR- 
90-86, CS-TR 2494, July 1990. 


In certain signal processing applications it is required to compute the null space 
of a matrix whose rows are samples of a signal. The usual tool for doing this is 
the singular value decomposition. However, the singular value decomposition has 
the drawback that it requires O(p?) operations to recompute when a new sample 
arrives. In this paper, we show that a different decomposition, called the URV, 
decomposition is equally effective in exhibiting the null space and can be updated 
in O(p*) time. The updating technique can be run on a linear array of p processors 
in O(p) time. 


21. G. W. Stewart and G. Zhang, “On a Direct Method for the Solution of Nearly 
Uncoupled Markov Chains,” UMIACS TR 90-95, CS TR 2504, July 1990. 


This note is concerned with the accuracy of the solution of nearly uncoupled 
Markov chains by a direct method based on the LU decomposition. It is shown 
that plain Gaussian elimination may fail in the presence of rounding errors. A 
modification of Gaussian elimination with diagonal pivoting as well as corrections 
of small pivots by sums of off-diagonal elements in the pivoting columns is proposed 
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22: 


23. 


24. 


26. 


and analyzed. It is shown that the accuracy of the solution is affected by two 
condition numbers associate with the aggregate and the coupling respectively. 


G. W. Stewart, “Perturbation Theory for the Singular Value Decomposition,” 
UMIACS-TR-90-124, CS-TR 2539, September, 1990. 


The singular value decomposition has a number of applications in digital signal 
processing. Ilowever, the the decomposition must be computed from a matrix 
consisting of both signal and noise. It is therefore important to be able to assess 
the effects of the noise on the singular values and singular vectors — a problem in 
classical perturbation theory. In this paper we survey the perturbation theory of 
the singular value decomposition. 


Jewel B. Barlow, Moghen M. Monahemi, and Dianne P. O'Leary, “Constrained 
matrix Liapunov equations,” UMIACS-91-16, CS-2599, January 1991. 


We consider the problem of finding matrices L and T satisfying TA—FT = LC and 
TB =0. We establish existence conditions for the solution, derive an algorithm for 
computing the solution, and discuss conditions under which the matrix [C7,T7] 
is full rank. The problem arises in control theory, in the design of reduced order 
observers that achieve loop transfer recovery. 


Jewel B. Barlow, Moghen M. Monahemi, and Dianne P. O’Leary, “The design of 
reduced-order Luenberger observers with precise LTR,” UMIACS-91-17, CS-2600, 
January 1991. 


This work concerns the design of reduced-order observers for controllable, observ- 
able, and regular systems in which the number of measurements is more than the 
number of controls. It uses eigenstructure assignment whereas other approaches 
use Kalman filter (LQG/LTR) methods. The advantages of this approach are 
precise rather than approximate LTR, no restriction to minimum phase systems, 
finite rather than infinite observer gain, and simpler and more efficient numerical 
calculation. Case studies are presented illustrating these features. 


. G. W. Stewart ”On an Algorithm for Refining a Rank-Revealing URV Factoriza- 


tion and a Perturbation Theorem for Singuiar Values” UMIACS-TR-01-38, CS-TR 
2626, March 1991 


In this note we consider an iterative algorithm for moving a triangular matrix 
toward diagonality. The method is shown to converge under conditions that are 
likely to be met in practice. A result of the convergence proof in a new perturba- 
tion theorem for singular values. 


G. W. Stewart "Updating A Rank-Revealing ULV Decomposition” UMIACS-TR- 
91-39, CS-TR 2627, March 1991 
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28. 


20. 


A ULV decomposition of a matrix A of order n is a decomposition of the form 
A= ULV", where U and V are orthogonal matrices and L is a lower triangular 
matrix. When A is approximately of rank k, the decomposition is rank revealing 
if the last n — k rows of L are small. This paper presents algorithms for updating 
a rank-revealing ULV decomposition. The algorithms run in O(n?) time, and can 
be implemented on a linear array of processors to run in O(n) time 


. G. Aaams, M. F. Griffin, and G. W. Stewart ”Direction-of-Arrival Estimation Us- 


ing the Rank-Revealing URV Decomposition” UMIACS-TR 91-46, CS-TR-2640, 
March 1991 


An algorithm for updating the null space of a matrix is described. The algorithm 
is based on a new decomposition, called the URV decomposition, which can be 
updated in O(N?) and serves as an intermediary between the QR decomposition 
and the singular value decomposition. The URV decomposition is applied to a 
high-resolution direction of arrival problem based on the MUSIC algorithm. A 
virtue of the updating algorithm is the running estimate of rank. 


G. W. Stewart "Error Analysis of QR Updating with Exponential Windowing” 
UMIACS-TR-91-79, CS-TR 2685, May 1991 


Exponential windowing is a widely used technique for suppressing the effects 
of old data as new data is added to a matrix. Specifically, given an n x p 
matrix X, and a “forgetting factor” 6 € (0,1), one works with the matrix 
diag(8"-1, B"-*,...,1)Xy. In this paper we examine an updating algorithm for 
computing the QR factorization of diag(6"~!, @"-*,...,1)X, and show that it is 
unconditionally stable in the presence of rounding errors. 


Pill Seong Park, “Iterative Solution of Sparse Singular Systems of Equations Aris- 
ing from Queuing Networks,” UMIACS-TR-91-83, CS-TR-2690, June 1991 : 
Iterative methods for solving large sparse singular systems Ar = Ὁ arising from 
queuing networks having overflow capacity are presented. For such overflow mod- 
els, no analytic solution exists and the Kolmogorov balance equation has to be 
solved explicitly. The resulting matrix is irreducible, non-symmetric, and has a 
one dimensional null space. However, the matrix is sparse, highly structured, and 
has property-A. 


We transform the problem into an eigenvalue problem Tz = z, where the eigen- 
vector corresponding to the eigenvalue 1 is the desired solution. The choice of the 
Jacobi iteration leads to a 2-cyclic algorithm which reduces the necessary amount 
of work by 1/2, and computation of the residual needs no extra work. 


Inspired by the similarity between Markov models of queuing networks and the 
grid problems arising from discretization of partial differential equations, a few 
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ageregation/disaggregation(A/D) type methods with some ideas from geometric 
multigrid methods are considered. 


More effective methods to accelerate convergence of the 2-cyclic algorithm are 
discussed. The methods employ orthogonal projectors onto the subspace spanned 
by dominant eigencomponents in the residual. The projection step is further re- 
fined by Arnoldi’s method without extra matrix-vector multiplication by using 
the results of power iterations. Adopting the Chebyshev iterations as a main 
driving force combined with power iterations and projection steps refined by 
Arnoldi’s method, the resulting hybrid algorithm outperforms the Chebyshev iter- 
ation methods with optimal parameters. We study the convergence of the hybrid 
algorithm and look for conditions when the projection step can accelerate conver- 
gence of the underlying method. Numerical experiments provide further evidence 
that the methods can be quite efficient, especially for harder problems. 


30. G. W. Stewart ”Perturbation Theory for Rectangular Matrix Pencils” UMIACS- 
TR-91-105, CS-TR 2721, July 1991 
The theory of eigenvalues and eigenvectors of rectangular matrix pencils is ‘com- 
plicated by the fact that arbitrarily small perturbations of the pencil can cause 
them disappear. However, there are applications in which the properties of the 
pencil ensure the existence of eigenvalues and eigenvectors. In this paper it is 
shown how to develop a perturbation theory for such pencils. 


31. Per Christian Hansen and Dianne Prost O’Leary, “The use of the L-curve in the 
regularization of discrete ill-posed problems,” UMIACS-91-142, CS-2781, October 
1991. 


In order to produce reasonable solutions to ill-posed problems, regularization 
algorithms are often used. The L-curve is a plot—for all valid regularization 
parameters—of the size of the regularize d solution versus the size of the corre- 
sponding residual. We establish two main results. First we give a unifying charac 
terization of various regularization methods and show that the measurement of 
“size” is dependent on the particular regularization method chosen; for example, 
the 2-norm is appropriate for Tikhonov regularization, but a 1-norm in the coor- 
dinate system of the singular value decomposition (SVD) is relevant to truncated 
SVD regularization . Secondly, we propose a new method for choosing the reg- 
ularization parameter based on the L-curve, and show how this method can be 
implemented efficiently. We compare the method to generalized cross validation 
and demonstrate that our new method is more robust in the presence of correlated 
errors. 


32. Chiou-Ming Hnang and Dianne P. O’Leary, A Krylov Multisplitting Algorithm for 
Solving Linear Systems of Equations Inst. for Mathematics and Its Applications, 
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University of Minnesota Preprint 992, July 1992. 

We consider the practical implementation of Krylov subspace methods (conjugate 
gradients, GMRES, etc.) for parallel computers in the case where the precondi- 
tioning matrix arises from a multisplitting. We show that the algorithm can be 
efficiently implemented by dividing the work into tasks that generate search direc- 
tions and a single task that minimizes over the resulting subspace. Each task is 
assigned to a subset of processors. It is not necessary for the minimization task to 
send frequent information to the direction generating tasks, and this leads to high 
utilization with a minimum of synchronization. We study the convergence prop- 
erties of various forms of the algorithm and present results of numerical examples 
on a sequential computer. We consider the practical implementation of Krylov 
subspace methods (conjugate gradients, GMRES, etc.) for parallel computers in 
the case where the preconditioning matrix arises from a multisplitting. We show 
that the algorithm can be efficiently implemented by dividing the work into tasks 
that generate search directions and a single task that minimizes over the resulting 
subspace. Each task is assigned to a subset of processors. It is not necessary for 
the minimization task to send frequent information to the direction generating 
tasks, and this leads to high utilization with a minimum of synchronization. We 
study the convergence properties of various forms of the algorithm and present 
results of numerical examples on a sequential computer. 


33. J. Barlow, M. Monahemi, and D. P. O’Leary, Constrained Matrix Sylvester Equa- 
tions Computer Science Department Report CS-?, Institute for Advanced Com- 
puter Studies Report UMIACS-91-?, University of Maryland, 1991. We consider 
the problem of finding matrices Z and T satisfying TA -- FT = LC and TB = 0. 
We establish existence conditions for the solution, derive an algorithm for com- 
puting the solution, and discuss conditions under which the matrix [C7,T7] is 
full rank. The problem arises in control theory, in the design of reduced order 
observers that achieve loop transfer recovery. 


34. J. Barlow, M. Monahemi, and D. P. O’Leary, The Design of Reduced-Order 
Observers with Precise Loop Transfer Recovery, Computer Science Department 
Report CS-?, Institute for Advanced Computer Studies Report UMIACS-92-?, 
University of Maryland, 1991 This paper concerns the design of reduced-order ob- 
servers for systems in which the number of measurements is more than the number 
of controls. We develop an algorithm that applies to regular systems that have no 
transmission zeroes. The algorithm uses eigenstructure assignment whereas other 
approaches use Kalman filter methods. The advantages of this approach are the 
following: i) precise loop transfer recovery rather than approximate loop transfer 
recovery, ii) finite observer gain rather than asymptotic observer gain, iii) modest 
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computational tools and operations counts. Case studies are presented illustrating 
these features. 


35. Dianne P. O'Leary, Iterative Methods for Finding the Stationary Vector for Markov 
Chains, Institute for Mathematics and Its Applications, University of Minnesota, 
Preprint 932, February 1992 


This overview concerns methods for estimating the steady-state vector of ergodic 
Markov chains. The problem can be cast as an ordinary eigenvalue problem, but 
since the cigenvalue is known, it can equally well be studied as a nullspace prob- 
lem or as a linear system. We discuss iterative methods for each of these three 
formulations. Many of the applications, such as queuing modeling, have special 
structure that can be exploited computationally, and we give special emphasis to 
three ideas for exploiting this structure: decomposibility, separability, and multi- 
level aggregation. Such ideas result in a large number of diverse algorithms, many 
of which are poorly understood. 


36. K.J.R. Liu, G.W. Stewart, and Y.-J. J. Wu, URV ESPRIT for Tracking Time- 
Varying Signals Computer Science Department Report CS-?, Institute for Ad- 
vanced Computer Studies Report UMIACS-92-?, University of Maryland, October 
1992. 


ESPRIT is an algorithm for determining the fixed directions of arrival of a set 
of narrowband signals at an array of sensors. Unfortunately, its computational 
burden makes it unsuitable for real time processing of signals with time-varying 
directions of arrival. In this work we develop a new implementation of ESPRIT 
that has potential for real time processing. It is based on a rank-revealing URV 
decomposition, rather than the eigendecomposition or singular value decompo- 
sition used in previous ESPRIT algorithms. We demonstrate its performance 
on simulated data representing both constant and time-varying signals. We find 
that the URV-based ESPRIT algorithm is effective for estimating time-varying 
directions-of-arrival at considerable computational savings over the SVD-based 
algorithm. 


37. G. W. Stewart, On the Perturbation of Markov Chains with Nearly Transient 
States, UMIACS-TR-92-14, CS-TR-2835, January 1992. 


Let A be an irreducible stochastic matrix of the form 
An Ey2 
A= ᾿ 
( An A22 


If E22 were zero, the states corresponding to 4.22 would be transient in the sense 
that if the steady state vector y! is partitioned conformally in the form (yj? yf) 
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then yt = 0. If 1122 is small, then ys will be small, and the states are said to 
be nearly transient. It this paper it is shown that small relative perturbations in 
Αι. «121, and Age, though potentially larger than ys, induce only small relative 
perturbations in ys 


2" 


38. G. W. Stewart, On the Perturbation of LU, Cholesky, and QR Factorizations, 
UMIACS-TR-92-24, CS-TR-2848, February 1992 
In this paper error bounds are derived for a first order expansion of the LU fac- 
torization of a perturbation of the identity. The results are applied to obtain 
perturbation expansions of the LU, Cholesky, and QR factorizations. | 


39. G. W. Stewart, Updating URV Decompositions in Parallel, UMIACS-TR-92-44, 
CS-TR-2880, April 1992 
A URV decomposition of a matrix is a factorization of the matrix into the product 
of a unitary matrix (U), an upper triangular matrix (R), and another unitary 
matrix (V). In an earlier paper [UMIACS-TR-90-86] it was shown how to update 
a URV decomposition in such a way that it reveals the effective rank of the matrix. 
It was also argued that the updating procedure could be implemented in parallel 
on a linear array of processors; however, no specific algorithms were given. This 
paper gives a detailed implementation of the updating procedure. 


40. Z. Bai and G. W. Stewart, SRRIT— A FORTRAN Subroutine to Calculate the 
Dominant Invariant, Subspace of a Nonsymmetric Matrix, UMIACS TR-92-61, 
CS TR-2908, May, 1992 


SRRIT is a FORTRAN program to calculate an approximate orthonormal basis for 
a dominant invariant subspace of a real matrix A by the method of simultaneous 
iteration Specifically, given an integer m, SRRIT attempts to compute a matrix Q 
with m orthonormal columns and real quasi-triangular matrix T of order m such 
that the equation 


AQ = QT 


is satisfied up to a tolerance specified by the user. The eigenvalues of T are 
approximations to the m largest eigenvalues of A, and the columns of Q span the 
invariant subspace corresponding to those eigenvalues. SRRIT references A only 
through a user provided subroutine to form the product AQ; hence it is suitable 
for large sparse problems. 


41. G. W. Stewart, Determining Rank in the Presence of Error, UMIACS TR-92-108, 
CS TR-2972, October 1992 


The problem of determining rank in the presence of error occurs in a number 
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of applications. The usual approach is to compute a rank-revealing decompo- 
sition and make a decision about the rank by examining the small elements of 
the decomposition. In this paper we look at three commonly use decompositions: 
the singular value decomposition, the pivoted QR decomposition, and the URV 
decomposition. 
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